# **Computer Arithmetic**

- ALU design
- Multiplication Booth Encoding
- Floating-Point Numbers



### Full Adder

review

### Recall

Adding 2's Complement Binary Numbers

$$P_3 P_2 P_1 P_0 Y_3 Y_2 Y_1 Y_0$$



$$c_{out} = a b + a c_{in} + b c_{in}$$
  
 $sum = a xor b xor c_{in}$ 

#### Note 2-gate delays

- N-bit ripple-carry adder: a series connection of n FA's
- Sign-extension: repeating sign bit to fill up higher bits,
   e.g. 16 to 32bits
- Overflow:

Carry into MSB(Most Significant Bit) is a sign-bit Carry out of MSB is also a sign-bit for (N+1) bit representation They should be the same, otherwise OVERFLOW.

### Multiplication

- More complicated than addition
   accomplished via shifting and addition
- More time and more area
- Let's look at a version based on grade-school algorithm

$$\begin{array}{c} 0010 \quad \text{(multiplicand)} \\ \underline{x} \quad 1011 \quad \text{(multiplier)} \end{array}$$

◆ Negative numbers: convert and multiply

# **Unsigned Integer Multiplication**

- Add shift.  $B \times A => P \parallel A$ .
- ♦ LSB of A =  $1 \rightarrow P = P + B$ .
- ◆ Shift right Carry & P & A by 1 bit.



### Unsigned Integer Multiplication - Example

◆ 0010x1011: BxA=P

```
1011
  0000
                   0010
+ 0010
  0010
           1011 (and then shift right)
           0101
  0001
+ 0010
  0011
           0101 (and then shift right)
  0001
           1010
+ 0000
  0001
           1010 (and then shift right)
           1101
  0000
+ 0010
           1101 (and then shift right)
  0010
                   final answer in P||A.
           0110
  0001
```

```
0010
x 1011
  0010
 0010
0000
0010
0010110
```

## **Unsigned Integer Multiplication**

◆How about negative number multiplication?

$$\rightarrow$$
 X.Y; |X|.|Y|. g1

Check  $sign(X) \neq sign(Y) \rightarrow -|X| \cdot |Y|$ .

◆Checking sign is overhead!

g1 just do the previous slide algorithm with 2's complement, then it's OK

Booth is to reduce overhead not just sign bit check but also reducing additions.  $_{\rm ghlee,\ 2015-03-31}$ 

### Booth's Algorithm for 2's Complement. Numbers

- ◆ Consider Y·X, where X = 0111
   Since, 0111 = 1000 0001,
   Y·X = Y·(1000) Y·(0001)
- Subtract Y and Add 8 times Y (i.e. shift 3 times). This req. 1 add, 1 sub. & 4 shifts.
- Recording the above 0111 = 1000 − 0001=
   100 1 is called Booth Recording (in radix 2)

## **Booth Recording**

• Let  $X = X_{n-1} \cdots X_0$  for Y. X At i <sup>th</sup> step of multiplication,  $(X_{-1} = 0)$ 

| Xı | X <sub>I-1</sub> | Recording | Algorithm  | Note             |
|----|------------------|-----------|------------|------------------|
| 1  | 0                | 1         | -Y & shift | Beginning of 1's |
| 1  | 1                | 0         | Shift      | Middle of 1's    |
| 0  | 1                | 1         | +Y & shift | End of 1's       |
| 0  | 0                | 0         | shift      | Middle of 0's    |

### Example

```
Y. X \rightarrow P \parallel X with Y = -5, X = -6 in 4 bit
 Р
      X
          10100; X_0 = X_{-1} = 0 (shift right)
0000
     0 1 0 \frac{10}{10} ; X _1 = 1 \neq X_0 = 0 ( subtract)
0000
1011
1101 1010 ; shift right
1110 110\underline{10}; X_3 = 1, X_2 = 0: (subtract)
0 0 1 1 1 1 1 0 1 ; & shift right
0001 1110
                   : 30 = -5 \times -6
```

### Example

```
Y = 5, X = -6 in 4 bit
 Р
         X
             101\underline{00} ; X_0 = X_{-1} = 0 (shift right)
0000
       0 1 0 \underline{10} ; X_1 = 1 \neq X_0 = 0 ( subtract)
0000
      0 1 0 1 ; & shift right (with sign-bit)
1011
             10101 ; X_2 = 0 , X_1 = 1 : Add
1101
0101
       1010 ; shift right
0010
            0 \, 1 \, 0 \, 1 \, 0 ; X <sub>3</sub> = 1 , X <sub>2</sub> =0 : (subtract)
0001
            0101 ;
1011
                    ; shift right
       0101
1100
1110
       0010 : -30 = 5 \times -6
```

## **Booth Recording**

- Booth recording takes care of negative multiplier
- In the example

Note
Hope to have less add/sub:
alternating 0 and 1 makes worst case

g2 so -6=1010 is the worst case! ghlee, 2015-03-31

# **Faster Array Multiplier**

Instead of doing the adding partial products linearly, i.e. one after another partial product, can be done in parallel by grouping partial products

Each of group has three partial products
Wallace Tree Multiplier
Each group of two partial products
Binary Tree Multiplier

|                     | ${\sf m^3}_3$  | •              | · ·            | m <sup>1</sup> <sub>2</sub><br>m <sup>2</sup> <sub>1</sub> | $m_{1}^{1}$    | m <sup>0</sup> <sub>1</sub><br>m <sup>1</sup> <sub>0</sub> | m <sup>0</sup> <sub>0</sub> |
|---------------------|----------------|----------------|----------------|------------------------------------------------------------|----------------|------------------------------------------------------------|-----------------------------|
| +<br>P <sub>7</sub> | P <sub>6</sub> | P <sub>5</sub> | P <sub>4</sub> | P <sub>3</sub>                                             | P <sub>2</sub> | P <sub>1</sub>                                             | P <sub>0</sub>              |

### Division

 Can be done in reverse of multiplication of shift and add
 shift-left and subtract

### **Faster Division uses**

- the fact  $x/y = x^*(1/y)$  iterative guess of 1/y
- multiple quotient bits with precalculated table:
   SRT division

read Text 3.4(p.178~182)

### Floating Point Numbers

- We need a way to represent
  - o numbers with fractions, e.g., 3.1416
  - o very small numbers, e.g., .00000001
  - o very large numbers, e.g., 31,557,600,000,000
- Normalized Representation:
  - No leading zeros and one significant digit before decimal point: (sign)m<sub>x</sub>yz.... X 10<sup>e</sup>

```
e.g. +3.15576 \times 10^9
```

sign: +

significand: 3.15576

exponent: 9

number base: 10, i.e decimal

## Floating Point Numbers

### For computer floating point:

Number base: 2 (2<sup>n</sup>)

Sign: 0 (+) or 1(-)

Exponent: actual exponent + bias

Significand: binary number (with Sign bit)

(-1)<sup>sign</sup> • significand • 2<sup>exponent</sup>

- bias: to make smallest exponent 0
   comparing exponents and checking 0 more
   convenient with biased representation
- o more bits for significand gives more accuracy
- o more bits for exponent increases range
- Mantissa (or Fraction): normalized with no significant digit left to the decimal point 0.315576x10<sup>10</sup>

### Floating Point Numbers

- ◆ (-1)<sup>sign</sup>x(significand)x2<sup>exponent</sup>
- Issues
  - o Format: which is which and how many bits, number base (2<sup>n</sup>)
  - Handling of exceptions
    - v Overflow, Underflow, very small numbers
  - Rounding
    - ∀ How to round up numbers?

# IEEE 754 floating point standard

# IEEE 754 floating point standard

- o single precision: 8 bit exponent, 24 bit significand (23+1)
- o double precision: 11 bit exponent, 53 bit significand (52+1)
- Do not represent 1 before the binary point: It's always 1!
- Exponent Bias: to make smallest negative exponent 0
  - √ 127 for single precision; 0 ~ 255 with 255 (all 1's) reserved
  - v 1023 for double precision
    - v all 1's for exponent: reserved for special cases

### Note: some specials in IEEE754

- o zero: zero significand and zero exponent
- 0/0: NaN (nozero significand, all 1's for exponent)
- n/0: infinity (zero significand, all 1's for exponent)
- Denormal: close to zero (nonzero significand with 0 exponent)
  - v Without denormal, more possibility of Underflow

## IEEE 754 floating-point standard

### ◆ Example:

## Floating Point Complexities

- Operations are somewhat more complicated
- Needs to compare exponents and align them by shifting significand
- Needs to work on significand and exponent separately
- Needs to normalize the result for correct representation

```
e.g. (read Text p. 198)
Add 0.5 and -0.4375 in binary Floating Point with 4-bit significand
```

$$0.5 = 1.000 \text{ x } 2^{-1}$$
  
 $-0.4375 = -7/16 = -1.110 \text{ x } 2^{-2}$ 

- 1. Shift lesser exponent operand: -0.111x 2<sup>-1</sup>
- 2. Add significands: 1.000 0.111 = 0.001
- 3. Normalize  $1.000 \times 2^{-3} \times 2^{-1} = 1.000 \times 2^{-4}$

G. Lee

CNCE250-Slide 3.19

# Floating point addition



## Floating Point Complexities

- In addition to overflow we can have "underflow"
- Accuracy can be a big problem
  - o IEEE 754 keeps two extra bits, guard and round representing two additional bits for better precision plus one sticky bit (the bit moved out when shifted for exponent alignment) representing any non-zero bit beyond the precision (read Text p.206&208)
  - o four rounding modes: up/down, truncate, nearest even.
  - o positive divided by zero yields "infinity"
  - o zero divide by zero yields "not a number"
  - o other complexities
- Implementing the standard can be tricky
  - o see text for description of 80x86 and Pentium bug!

0.5

0.51

0.501

0.5000003

### MIPS ALU

- We can build an ALU to support the MIPS instruction set
  - o key idea: use multiplexor to select the output we want
  - we can efficiently perform subtraction using two's complement
  - we can replicate a 1-bit ALU to produce a 32-bit ALU
- Important points about hardware
  - all of the gates are always working
  - o the speed of a gate is affected by the number of inputs to the gate
  - o the speed of a circuit is affected by the number of gates in series (on the "critical path" or the "deepest level of logic")
- Our primary focus: comprehension, however,
  - Clever changes to organization can improve performance (similar to using better algorithms in software)
  - we'll look at examples for addition and multiplication

## Building a 32 bit ALU

### Support AND, OR, Add/Sub





# Tailoring the ALU to the MIPS



- ◆ Adding more and select through Mux per opcode
  - Need to support test for equality (beq \$t5, \$t6, 0x100)

v use subtraction: (a-b) = 0 implies a = b

| Name | Format | Example |        |        | Comment s  |        |       |                   |
|------|--------|---------|--------|--------|------------|--------|-------|-------------------|
|      |        | 6bits   | 5 bits | 5 bits | 5 bits     | 5 bits | 6bits |                   |
| add  | R      | 0       | 2      | 3      | 1          | 0      | 32    | add \$1,\$2,\$3   |
| sub  | R      | 0       | 2      | 3      | 1          | 0      | 34    | sub\$1\$2,\$3     |
| addi | I      | 8       | 2      | 1      | 100        |        |       | addi \$1,\$2,100  |
| addu | R      | 0       | 2      | 3      | 1          | 0      | 35    | subu \$1,\$2,\$3  |
| and  | R      | 0       | 2      | 3      | 1          | 0      | 36    | and \$1,\$2,\$3   |
| or   | R      | 0       | 2      | 3      | 1          | 0      | 37    | or \$1, \$2, \$3  |
| lw   | I      | 35      | 2      | 1      | 100        |        |       | lw \$1,100(\$2)   |
| SW   | I      | 43      | 2      | 1      | 100        |        |       | sw \$1,100 (\$2)  |
| beq  | I      | 4       | 1      | 2      | 25         |        | _     | beq \$1, \$2, 100 |
| j    | J      | 2       | 2500   |        | j 10 0 0 0 |        |       |                   |

# What about subtraction (a - b)?

- Two's complement approach: just negate b and add.
- ◆ A clever solution: negate+1 = 2's complement
   negate signal = carry-in for least significant bit position





B xor 1 = not B

## Tailoring the ALU to the MIPS

- Need to support test for equality (beq \$t5, \$t6, L)
  - use subtraction: (a-b) = 0 implies a = b
- to support the set-on-less-than instruction (slt \$r1, \$r2, \$r3)
  - o remember: slt is an arithmetic instruction
  - o produces a 1 if rs < rt and 0 otherwise
  - use subtraction: (a-b) < 0 implies a < b</p>

## Supporting slt

Can we figure out the idea?



◆ If a<b then (a-b)<0</p>

So, if after subtraction, the result is Negative, i.e. MSB=1

Then

LSB of Result = MSB and other bits of Result = 0



# Supporting slt & overflow



### Test for equality

### •Note: zero is 1 when the result is zero!

### Notice control lines:

0000 = and

0001 = or

0010 = add

0110 = subtract

0111 = slt

(see Sec. 4.4. p.247)



for beg, subtraction yields 0. all bit positions will be o. so negating them makes 1. for sub and slt we need subtraction, so Bnegate will be 1 and it will feed to the lsb as a carry-in to have two's comlement. ghlee, 2015-03-31

## **Chapter Three Summary**

- Computer arithmetic is constrained by limited precision
  - May Not be asscociative (a+b)+c ≠ a+(b+c)
- Bit patterns have no inherent meaning but standards do exist
  - o two's complement
  - IEEE 754 floating point
- Computer instructions determine "meaning" of the bit patterns

For Fast Adder, read the supplemental slides posted!

We are ready to move on (and implement the processor)!